在多代理路径查找(MAPF)问题中,一组在图表上移动的代理必须达到其自身各自的目的地,而无需间间冲突。在实用的MAPF应用中,如自动仓库导航,偶尔有数百个或更多代理商,MAPF必须在终身基础上迭代地解决。这种情景排除了离线计算密集型最佳方法的简单调整;因此,可扩展的子最优算法用于此类设置。理想的可扩展算法适用于可预测计算时间的迭代方案和输出合理的解决方案。对于上述目的,在本研究中,提出了一种具有回溯(PIBT)的优先级继承的新型算法以迭代地解决MAPF。 PIBT依赖于适应性优先级方案,专注于多个代理的相邻运动;因此它可以应用于若干域。我们证明,无论其数量如何,当环境是图形时,所有代理都保证在有限的时间内达到目的地,使得所有相邻节点属于一个简单的周期(例如,双绞线)。实验结果涵盖了各种场景,包括真正的机器人演示,揭示了所提出的方法的好处。即使用数百种代理商,PIBT也会立即产生可接受的解决方案,可以解决其他事实上MAPF方法的大型情况。此外,PIBT在运行时和解决方案质量的自动化仓库中的传送包中的迭代方案上占据了现有方法。
translated by 谷歌翻译
Text-guided image editing can have a transformative impact in supporting creative applications. A key challenge is to generate edits that are faithful to input text prompts, while consistent with input images. We present Imagen Editor, a cascaded diffusion model built, by fine-tuning Imagen on text-guided image inpainting. Imagen Editor's edits are faithful to the text prompts, which is accomplished by using object detectors to propose inpainting masks during training. In addition, Imagen Editor captures fine details in the input image by conditioning the cascaded pipeline on the original high resolution image. To improve qualitative and quantitative evaluation, we introduce EditBench, a systematic benchmark for text-guided image inpainting. EditBench evaluates inpainting edits on natural and generated images exploring objects, attributes, and scenes. Through extensive human evaluation on EditBench, we find that object-masking during training leads to across-the-board improvements in text-image alignment -- such that Imagen Editor is preferred over DALL-E 2 and Stable Diffusion -- and, as a cohort, these models are better at object-rendering than text-rendering, and handle material/color/size attributes better than count/shape attributes.
translated by 谷歌翻译
Spatio-temporal modeling as a canonical task of multivariate time series forecasting has been a significant research topic in AI community. To address the underlying heterogeneity and non-stationarity implied in the graph streams, in this study, we propose Spatio-Temporal Meta-Graph Learning as a novel Graph Structure Learning mechanism on spatio-temporal data. Specifically, we implement this idea into Meta-Graph Convolutional Recurrent Network (MegaCRN) by plugging the Meta-Graph Learner powered by a Meta-Node Bank into GCRN encoder-decoder. We conduct a comprehensive evaluation on two benchmark datasets (METR-LA and PEMS-BAY) and a large-scale spatio-temporal dataset that contains a variaty of non-stationary phenomena. Our model outperformed the state-of-the-arts to a large degree on all three datasets (over 27% MAE and 34% RMSE). Besides, through a series of qualitative evaluations, we demonstrate that our model can explicitly disentangle locations and time slots with different patterns and be robustly adaptive to different anomalous situations. Codes and datasets are available at https://github.com/deepkashiwa20/MegaCRN.
translated by 谷歌翻译
Interpretable entity representations (IERs) are sparse embeddings that are "human-readable" in that dimensions correspond to fine-grained entity types and values are predicted probabilities that a given entity is of the corresponding type. These methods perform well in zero-shot and low supervision settings. Compared to standard dense neural embeddings, such interpretable representations may permit analysis and debugging. However, while fine-tuning sparse, interpretable representations improves accuracy on downstream tasks, it destroys the semantics of the dimensions which were enforced in pre-training. Can we maintain the interpretable semantics afforded by IERs while improving predictive performance on downstream tasks? Toward this end, we propose Intermediate enTity-based Sparse Interpretable Representation Learning (ItsIRL). ItsIRL realizes improved performance over prior IERs on biomedical tasks, while maintaining "interpretability" generally and their ability to support model debugging specifically. The latter is enabled in part by the ability to perform "counterfactual" fine-grained entity type manipulation, which we explore in this work. Finally, we propose a method to construct entity type based class prototypes for revealing global semantic properties of classes learned by our model.
translated by 谷歌翻译
Traffic forecasting as a canonical task of multivariate time series forecasting has been a significant research topic in AI community. To address the spatio-temporal heterogeneity and non-stationarity implied in the traffic stream, in this study, we propose Spatio-Temporal Meta-Graph Learning as a novel Graph Structure Learning mechanism on spatio-temporal data. Specifically, we implement this idea into Meta-Graph Convolutional Recurrent Network (MegaCRN) by plugging the Meta-Graph Learner powered by a Meta-Node Bank into GCRN encoder-decoder. We conduct a comprehensive evaluation on two benchmark datasets (METR-LA and PEMS-BAY) and a new large-scale traffic speed dataset in which traffic incident information is contained. Our model outperformed the state-of-the-arts to a large degree on all three datasets (over 27% MAE and 34% RMSE). Besides, through a series of qualitative evaluations, we demonstrate that our model can explicitly disentangle the road links and time slots with different patterns and be robustly adaptive to any anomalous traffic situations. Codes and datasets are available at https://github.com/deepkashiwa20/MegaCRN.
translated by 谷歌翻译
黑盒优化在许多应用中具有潜力,例如在实验设计中的机器学习和优化中的超参数优化。 ISING机器对二进制优化问题很有用,因为变量可以由Ising机器的单个二进制变量表示。但是,使用ISING机器的常规方法无法处理具有非二进制值的黑框优化问题。为了克服这一限制,我们通过与三种不同的整数编码方法合作,通过使用ISING/退火计算机和分解计算机来提出一种用于整数变量的黑盒优化问题的方法。使用不同的编码方法,使用一个简单的问题来计算最稳定状态下的氢分子能量,以不同的编码方法进行数值评估。提出的方法可以使用任何整数编码方法来计算能量。但是,单次编码对于小尺寸的问题很有用。
translated by 谷歌翻译
本文介绍了社会团体活动识别的新框架。作为集团活动识别的一项扩展任务,社会群体活动识别需要识别多个子组活动并识别小组成员。大多数现有方法通过完善区域功能来解决这两个任务,然后将它们汇总到活动特征中。这样的启发式功能设计使特征的有效性易于不完整的人本地化,并无视场景上下文的重要性。此外,区域特征是识别小组成员的次优最佳选择,因为这些特征可能由该地区的人群主导并具有不同的语义。为了克服这些缺点,我们建议利用变形金刚中的注意力模块来产生有效的社会群体特征。我们的方法的设计方式使注意力模块识别,然后汇总与社会团体活动相关的特征,从而为每个社会群体产生一个有效的功能。小组成员信息嵌入到功能中,从而通过馈电网络访问。馈送网络的输出代表组,因此可以通过组和个人之间的简单匈牙利匹配来识别小组成员。实验结果表明,我们的方法优于排球和集体活动数据集的最先进方法。
translated by 谷歌翻译
本文提出了一个新颖的框架,以根据权威的睡眠医学指导自动捕获人睡眠的脑电图(EEG)信号的时间频率。该框架由两个部分组成:第一部分通过将输入EEG频谱图将其划分为一系列时频贴片来提取信息特征。第二部分是由基于注意力的体系结构有效地搜索分配的时频贴片和并行睡眠阶段定义因素之间的相关性构成的。拟议的管道在Sleep Heart Health研究数据集上进行了验证,其阶段唤醒,N2和N3的新最新结果获得了相应的F1分数为0.93、0.88和0.87,仅使用EEG信号。该提出的方法还具有高评分者间可靠性为0.80 kappa。我们还可以看到睡眠分期决策与提出方法提取的特征之间的对应关系,为我们的模型提供了强大的解释性。
translated by 谷歌翻译
基于知识的视觉问题应答(kbvqa)是一个需要外部世界知识的双模形任务,以便正确回答文本问题和相关图像。最近的单个模态文本工作已经显示了知识注入预培训的语言模型,特别是实体增强知识图形嵌入式,可以提高下游实体的任务的性能。在这项工作中,我们经验研究了在双模模型设置中应用的方法以及是否可以提高KBVQA任务的现有VQA系统的性能。我们试验两个大型公共可用的VQA数据集,(1)KVQA,其中包含大多数罕见的维基百科实体和(2)OKVQA,其与常识推理具有较少的实体和更符合。两者都缺乏明确的实体跨度,我们研究了不同弱监督和手动方法获得的效果。此外,我们分析了最近提出的双模和单一模态注意力的解释,这些实体增强了增强的表示。我们的结果表明,在KBVQA任务上表现出实质性的性能,无需额外的昂贵的预培训,我们为实体知识注射有助于提高模型的理解时提供见解。我们提供代码和增强的数据集以进行再现性。
translated by 谷歌翻译
在本研究中,我们提出了一种基于病例的新型图像检索(SIR)方法,用于苏木精和曙红(H&E)染色的恶性淋巴瘤的组织病理学图像。当将整个幻灯片图像(WSI)用作输入查询时,希望能够通过重点关注病理上重要区域(例如肿瘤细胞)中的图像斑块来检索相似情况。为了解决这个问题,我们采用了基于注意力的多个实例学习,这使我们能够在计算案例之间的相似性时专注于肿瘤特异性区域。此外,我们采用对比度距离度量学习将免疫组织化学(IHC)染色模式纳入有用的监督信息,以定义异质性恶性淋巴瘤病例之间的适当相似性。在对249例恶性淋巴瘤患者的实验中,我们证实该方法比基线基于病例的SIR方法表现出更高的评估措施。此外,病理学家的主观评估表明,我们使用IHC染色模式的相似性度量适用于代表恶性淋巴瘤H&E染色组织图像的相似性。
translated by 谷歌翻译